AITopics | llm chatbot

Collaborating Authors

llm chatbot

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Evolutionary Perspectives on the Evaluation of LLM-Based AI Agents: A Comprehensive Survey

Zhu, Jiachen, Zhu, Menghui, Rui, Renting, Shan, Rong, Zheng, Congmin, Chen, Bo, Xi, Yunjia, Lin, Jianghao, Liu, Weiwen, Tang, Ruiming, Yu, Yong, Zhang, Weinan

arXiv.org Artificial IntelligenceJun-16-2025

The advent of large language models (LLMs), such as GPT, Gemini, and DeepSeek, has significantly advanced natural language processing, giving rise to sophisticated chatbots capable of diverse language-related tasks. The transition from these traditional LLM chatbots to more advanced AI agents represents a pivotal evolutionary step. However, existing evaluation frameworks often blur the distinctions between LLM chatbots and AI agents, leading to confusion among researchers selecting appropriate benchmarks. To bridge this gap, this paper introduces a systematic analysis of current evaluation approaches, grounded in an evolutionary perspective. We provide a detailed analytical framework that clearly differentiates AI agents from LLM chatbots along five key aspects: complex environment, multi-source instructor, dynamic feedback, multi-modal perception, and advanced capability. Further, we categorize existing evaluation benchmarks based on external environments driving forces, and resulting advanced internal capabilities. For each category, we delineate relevant evaluation attributes, presented comprehensively in practical reference tables. Finally, we synthesize current trends and outline future evaluation methodologies through four critical lenses: environment, agent, evaluator, and metrics. Our findings offer actionable guidance for researchers, facilitating the informed selection and application of benchmarks in AI agent evaluation, thus fostering continued advancement in this rapidly evolving research domain.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.11102

Genre:

Overview (1.00)
Workflow (0.69)
Research Report > New Finding (0.34)

Industry:

Education (1.00)
Media (0.92)
Leisure & Entertainment > Games > Computer Games (0.68)
Information Technology > Software (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Leveraging Interview-Informed LLMs to Model Survey Responses: Comparative Insights from AI-Generated and Human Data

Zhang, Jihong, Liang, Xinya, Deng, Anqi, Bonge, Nicole, Tan, Lin, Zhang, Ling, Zarrett, Nicole

arXiv.org Artificial IntelligenceMay-29-2025

Mixed methods research integrates quantitative and qualitative data but faces challenges in aligning their distinct structures, particularly in examining measurement characteristics and individual response patterns. Advances in large language models (LLMs) offer promising solutions by generating synthetic survey responses informed by qualitative data. This study investigates whether LLMs, guided by personal interviews, can reliably predict human survey responses, using the Behavioral Regulations in Exercise Questionnaire (BREQ) and interviews from after-school program staff as a case study. Results indicate that LLMs capture overall response patterns but exhibit lower variability than humans. Incorporating interview data improves response diversity for some models (e.g., Claude, GPT), while well-crafted prompts and low-temperature settings enhance alignment between LLM and human responses. Demographic information had less impact than interview content on alignment accuracy. These findings underscore the potential of interview-informed LLMs to bridge qualitative and quantitative methodologies while revealing limitations in response variability, emotional interpretation, and psychometric fidelity. Future research should refine prompt design, explore bias mitigation, and optimize model settings to enhance the validity of LLM-generated survey data in social science research.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.21997

Country: North America > United States > Arkansas (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)
Personal > Interview (0.89)

Industry:

Health & Medicine (1.00)
Education > Educational Setting (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Framework to Assess the Persuasion Risks Large Language Model Chatbots Pose to Democratic Societies

Chen, Zhongren, Kalla, Joshua, Le, Quan, Nakamura-Sakai, Shinpei, Sekhon, Jasjeet, Wang, Ruixiao

arXiv.org Artificial IntelligenceMay-2-2025

In recent years, significant concern has emerged regarding the potential threat that Large Language Models (LLMs) pose to democratic societies through their persuasive capabilities. We expand upon existing research by conducting two survey experiments and a real-world simulation exercise to determine whether it is more cost effective to persuade a large number of voters using LLM chatbots compared to standard political campaign practice, taking into account both the "receive" and "accept" steps in the persuasion process (Zaller 1992). These experiments improve upon previous work by assessing extended interactions between humans and LLMs (instead of using single-shot interactions) and by assessing both short-and long-run persuasive effects (rather than simply asking users to rate the persuasiveness of LLM-produced content). In two survey experiments (N = 10,417) across three distinct political domains, we find that while LLMs are about as persuasive as actual campaign ads once voters are exposed to them, political persuasion in the real-world depends on both exposure to a persuasive message and its impact conditional on exposure. Through simulations based on real-world parameters, we estimate that LLM-based persuasion costs between $48-$74 per persuaded voter compared to $100 for traditional campaign methods, when accounting for the costs of exposure. However, it is currently much easier to scale traditional campaign persuasion methods than LLM-based persuasion. While LLMs do not currently appear to have substantially greater potential for large-scale political persuasion than existing non-LLM methods, this may change as LLM capabilities continue to improve and it becomes easier to scalably encourage exposure to persuasive LLMs. This research was deemed exempt by the Y ale University Human Subjects Committee.

large language model, machine learning, persuasion, (20 more...)

arXiv.org Artificial Intelligence

2505.00036

Country: North America > United States (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Research Report > Strength High (0.94)

Industry:

Government > Immigration & Customs (1.00)
Government > Voting & Elections (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Wearable Meets LLM for Stress Management: A Duoethnographic Study Integrating Wearable-Triggered Stressors and LLM Chatbots for Personalized Interventions

Neupane, Sameer, Dongre, Poorvesh, Gracanin, Denis, Kumar, Santosh

arXiv.org Artificial IntelligenceFeb-24-2025

We use a duoethnographic approach to study how wearable-integrated LLM chatbots can assist with personalized stress management, addressing the growing need for immediacy and tailored interventions. Two researchers interacted with custom chatbots over 22 days, responding to wearable-detected physiological prompts, recording stressor phrases, and using them to seek tailored interventions from their LLM-powered chatbots. They recorded their experiences in autoethnographic diaries and analyzed them during weekly discussions, focusing on the relevance, clarity, and impact of chatbot-generated interventions. Results showed that even though most events triggered by the wearable were meaningful, only one in five warranted an intervention. It also showed that interventions tailored with brief event descriptions were more effective than generic ones. By examining the intersection of wearables and LLM, this research contributes to developing more effective, user-centric mental health tools for real-time stress relief and behavior change.

chatbot, intervention, stressor, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3706599.3720197

2502.1765

Country:

North America > United States > Virginia > Montgomery County > Blacksburg (0.04)
North America > United States > Tennessee > Shelby County > Memphis (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.87)
Research Report > Experimental Study (0.68)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Can We Delegate Learning to Automation?: A Comparative Study of LLM Chatbots, Search Engines, and Books

Yang, Yeonsun, Shin, Ahyeon, Kang, Mincheol, Kang, Jiheon, Song, Jean Young

arXiv.org Artificial IntelligenceOct-2-2024

Learning is a key motivator behind information search behavior. With the emergence of LLM-based chatbots, students are increasingly turning to these tools as their primary resource for acquiring knowledge. However, the transition from traditional resources like textbooks and web searches raises concerns among educators. They worry that these fully-automated LLMs might lead students to delegate critical steps of search as learning. In this paper, we systematically uncover three main concerns from educators' perspectives. In response to these concerns, we conducted a mixed-methods study with 92 university students to compare three learning sources with different automation levels. Our results show that LLMs support comprehensive understanding of key concepts without promoting passive learning, though their effectiveness in knowledge retention was limited. Additionally, we found that academic performance impacted both learning outcomes and search patterns. Notably, higher-competence learners engaged more deeply with content through reading-intensive behaviors rather than relying on search activities.

educator, information, participant, (16 more...)

arXiv.org Artificial Intelligence

2410.01396

Country:

North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
Asia > South Korea > Daegu > Daegu (0.04)
North America > United States > New York > New York County > New York City (0.04)
(9 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education > Educational Setting > Higher Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Performance Assessment of ChatGPT vs Bard in Detecting Alzheimer's Dementia

T, Balamurali B, Chen, Jer-Ming

arXiv.org Artificial IntelligenceJan-30-2024

Large language models (LLMs) find increasing applications in many fields. Here, three LLM chatbots (ChatGPT-3.5, ChatGPT-4 and Bard) are assessed - in their current form, as publicly available - for their ability to recognize Alzheimer's Dementia (AD) and Cognitively Normal (CN) individuals using textual input derived from spontaneous speech recordings. Zero-shot learning approach is used at two levels of independent queries, with the second query (chain-of-thought prompting) eliciting more detailed than the first. Each LLM chatbot's performance is evaluated on the prediction generated in terms of accuracy, sensitivity, specificity, precision and F1 score. LLM chatbots generated three-class outcome ("AD", "CN", or "Unsure"). When positively identifying AD, Bard produced highest true-positives (89% recall) and highest F1 score (71%), but tended to misidentify CN as AD, with high confidence (low "Unsure" rates); for positively identifying CN, GPT-4 resulted in the highest true-negatives at 56% and highest F1 score (62%), adopting a diplomatic stance (moderate "Unsure" rates). Overall, three LLM chatbots identify AD vs CN surpassing chance-levels but do not currently satisfy clinical application.

cn subject, llm chatbot, unsure, (14 more...)

arXiv.org Artificial Intelligence

2402.01751

Country: Asia > Singapore (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Dementia (1.00)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Typing Cure: Experiences with Large Language Model Chatbots for Mental Health Support

Song, Inhwa, Pendse, Sachin R., Kumar, Neha, De Choudhury, Munmun

arXiv.org Artificial IntelligenceJan-25-2024

Research from the field of Computer-Supported Cooperative Work(CSCW), including the emergent area of Human-AI interaction, has increasingly examined the societal gaps that prevent people in need from accessing care, and analyzed how people turn to technology-mediated support to fill those gaps[14, 27, 44]. Large Language Model (LLM) chatbots have quickly become one such tool, quickly appropriated for mental health support by people experiencing severe distress and nowhere else to turn. Recent work has discussed how people in distress have turned to LLM chatbots (such as OpenAI's ChatGPT [8, 10] and Replika [28]) for mental health support, and social media users have described how LLM chatbots saved their lives[10, 47]. Following Freud and Breuer's[19] description of the beneficial nature of psychoanalysis as a "talking cure," some have called engagements with technologies for mental health a typing cure [22, 40, 51]. However, others have cautioned against the use of LLM chatbots for mental health support, noting that the outputs of LLM chatbots are less constrained than the rule-based chatbots of the past, with potential for harmful advice or recommendations. For example, the National Eating Disorder Association was forced to shut down their support chatbot in July 2023 after the chatbot provided harmful recommendations to users, including weight loss and dieting advice to users who may already have been struggling with disordered eating [10, 25, 75].

chatbot, llm chatbot, participant, (12 more...)

arXiv.org Artificial Intelligence

2401.14362

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
Asia > India (0.04)
South America > Brazil (0.04)
(12 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Add feedback

Chatterbox: Robust Transport for LLM Token Streaming under Unstable Network

Li, Hanchen, Liu, Yuhan, Cheng, Yihua, Ray, Siddhant, Du, Kuntai, Jiang, Junchen

arXiv.org Artificial IntelligenceJan-23-2024

To render each generated token in real time, the LLM server generates response tokens one by one and streams each generated token (or group of a few tokens) through the network to the user right after it is generated, which we refer to as LLM token streaming. However, under unstable network conditions, the LLM token streaming experience could suffer greatly from stalls since one packet loss could block the rendering of tokens contained in subsequent packets even if they arrive on time. With a real-world measurement study, we show that current applications including ChatGPT, Claude, and Bard all suffer from increased stall under unstable network. For this emerging token streaming problem in LLM Chatbots, we propose a novel transport layer scheme, called Chatterbox, which puts new generated tokens as well as currently unacknowledged tokens in the next outgoing packet. This ensures that each packet contains some new tokens and can be independently rendered when received, thus avoiding aforementioned stalls caused by missing packets. Through simulation under various network conditions, we show Chatterbox reduces stall ratio (proportion of token rendering wait time) by 71.0% compared to the token streaming method commonly used by real chatbot applications and by 31.6% compared to a custom packet duplication scheme. By tailoring Chatterbox to fit the token-by-token generation of LLM, we enable the Chatbots to respond like an eloquent speaker for users to better enjoy pervasive AI.

chatterbox, packet, stall, (16 more...)

arXiv.org Artificial Intelligence

2401.12961

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > San Diego County > Carlsbad (0.04)

Genre: Research Report (0.82)

Industry: Telecommunications > Networks (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Impact of Guidance and Interaction Strategies for LLM Use on Learner Performance and Perception

Kumar, Harsh, Musabirov, Ilya, Reza, Mohi, Shi, Jiakai, Wang, Xinyuan, Williams, Joseph Jay, Kuzminykh, Anastasia, Liut, Michael

arXiv.org Artificial IntelligenceJan-23-2024

Personalized chatbot-based teaching assistants can be crucial in addressing increasing classroom sizes, especially where direct teacher presence is limited. Large language models (LLMs) offer a promising avenue, with increasing research exploring their educational utility. However, the challenge lies not only in establishing the efficacy of LLMs but also in discerning the nuances of interaction between learners and these models, which impact learners' engagement and results. We conducted a formative study in an undergraduate computer science classroom (N=145) and a controlled experiment on Prolific (N=356) to explore the impact of four pedagogically informed guidance strategies on the learners' performance, confidence and trust in LLMs. Direct LLM answers marginally improved performance, while refining student solutions fostered trust. Structured guidance reduced random queries as well as instances of students copy-pasting assignment questions to the LLM. Our work highlights the role that teachers can play in shaping LLM-supported learning environments.

interaction, llm, student, (16 more...)

arXiv.org Artificial Intelligence

2310.13712

Country:

North America > Canada > Ontario > Toronto (0.16)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
(7 more...)

Genre:

Research Report > Strength High (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.46)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting > Higher Education (1.00)
Education > Curriculum > Subject-Specific Education (1.00)
Education > Assessment & Standards (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback